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Abstract 


This paper introduces DeepUnifiedMom, a deep learning framework that 
enhances portfolio management through a multi-task learning approach and 
a multi-gate mixture of experts. The essence of DeepUnifiedMom lies in its 
ability to create unified momentum portfolios that incorporate the dynamics 
of time series momentum across a spectrum of time frames—a feature often 
missing in traditional momentum strategies. Our comprehensive backtest- 
ing, encompassing diverse asset classes such as equity indexes, fixed income, 
foreign exchange, and commodities, demonstrates that DeepUnifiedMom con- 
sistently outperforms benchmark models, even after factoring in transaction 
costs. This superior performance underscores DeepUnifiedMom’s capability 
to capture the full spectrum of momentum opportunities within financial 
markets. The findings highlight DeepUnifiedMom as an effective tool for 
practitioners looking to exploit the entire range of momentum opportunities. 
It offers a compelling solution for improving risk-adjusted returns and is a 
valuable strategy for navigating the complexities of portfolio management. 
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1. Introduction 


Time-series momentum (TSMOM) strategies are a systematic approach 
in finance that leverages the persistence of asset returns over time. These 
strategies aim to exploit the continuation of underlying trends by establishing 
long positions during uptrends and short positions during downtrends 


gadeesh & Titman| |1993} |2001). The concept of momentum has garnered 


extensive attention in financial literature, underscoring its importance. Re- 
search by highlights the effectiveness of TSMOM, 
showcasing impressive risk-adjusted returns by simply buying assets with 
positive past 12-month returns. Further studies across various asset classes 
corroborate these findings, emphasizing the robustness of TSMOM strate- 
os (2016) (2018) 
(2017); (2020). At the core of TSMOM strategies is volatility 
scaling, a crucial method for managing exposure to market volatility 


& Kosowski |2012). Through adjustments in exposure levels during periods 


of low and high volatility, TSMOM strategies effectively mitigate the risk of 
significant losses during market turbulence (2020). This ap- 
proach has proven invaluable in enhancing Sharpe ratios, curbing extreme tail 
returns, and limiting maximum drawdowns in portfolios of risky assets 
2023). 

However, TSMOM strategies often fail to account for the interactions be- 
tween different assets within a portfolio (2023). This 
oversight can lead to excessive risk exposure, as these strategies treat each 
asset in isolation without considering how they correlate and interact. For 
example, the simultaneous momentum trends in equities, commodities, and 
currencies can amplify overall portfolio risk if not managed cohesively. Over- 
looking these interactions can diminish diversification benefits and increase 
the likelihood of significant drawdowns during market turbulence. 

Moreover, various asset classes, and even individual assets within those 
classes, exhibit distinct momentum dynamics with differing trend speeds. 
This variation makes risk allocation challenging, as applying a one-size-fits- 
all approach to trend speed can result in less-than-ideal investment out- 
comes |2023). To address this, some prac- 
titioners implement multiple TSMOM portfolios, each tailored to a specific 
trend speed, and distribute capital among them 
[2023). Nevertheless, this approach can 


still result in inefficient capital distribution, indicating a need for more re- 
fined methods that can adeptly navigate the varying momentum speeds of 
different assets and asset classes for more effective capital allocation. 

We propose an innovative approach to bridge the research gaps iden- 
tified earlier inspired by recent advancements in deep learning for portfo- 
lio construction. The proposed method utilizes a deep Multi-task Learning 
framework combined with a Multi-gate Mixture-of-Experts architecture to 
develop a momentum portfolio (2018). Since 
its development more than three decades ago, the Mixture-of-Experts (MoE) 
approach has become foundational in numerous research areas and has re- 
cently been pivotal in advancing the field as natural language processing in 


large language models (LLMs) (Fedus et al. |2022 2022} |Zoph et al.| [2022} [He] 
fet al} [2022} Gale et al.| 2022} [Shen et al. 2033). Our approach, which we 


call DeepUnifiedMom, aims to seamlessly n; momentum opportunities 
across various speeds, enhancing the efficacy of traditional momentum strate- 
gies. Our main contributions are: i) introducing a novel Multi-task Learning 
framework with a Multi-gate Mixture-of-Experts architecture, which facili- 
tates end-to-end learning for multi period portfolio construction to enhance 
momentum portfolio performance. ii) This study represents the first imple- 
mentation and examination of Multi-task Learning combined with a Multi- 
gate Mixture-of-Experts approach, specifically within portfolio construction. 
iii) We provide a comprehensive experimental analysis to evaluate and un- 
derstand the performance outcomes of this innovative methodology against 
existing momentum strategies and various portfolio construction techniques. 
The paper is organized as follows. In Section [2| we discuss existing work 
on contructing both classical and deep-learning momentum portfolios. Sec- 
tion |3| presents the proposed DeepUnifiedMom model. Next, in Section 
we present the setup of the different experiments, including a description of 
the dataset, benchmark models, and proposed backtesting strategy. In Sec- 
tion D| the results of our experiments are presented. Finally, in Section [6] we 
summarize our findings and suggest directions for future research. 


2. Related Work 


Research in deep learning in finance is well-established, with numerous 
studies leveraging these techniques to enhance prediction accuracy, portfolio 
optimization, and risk assessment. |Zhao & Yang} (2023) introduce a hybrid 


model, SA-DLSTM, combining emotion-enhanced convolutional neural net- 


works (ECNN), denoising autoencoders (DAE), and long short-term memory 
(LSTM) models to predict stock price movements by analyzing sentiment 
from user-generated comments. propose a portfolio con- 
struction model integrating the KMV model with a multiobjective water cy- 
cle algorithm, enhancing portfolio evaluation and stability by incorporating 
financial data from listed companies. address conservatism 
in worst-case robust portfolio optimization by suggesting hybrid models that 
use LSTM and XGBoost to forecast market movements and generate hy- 
perparameters for modeling. develop a multiagent-based 
deep reinforcement learning framework for portfolio management, featuring 
a two-level nested agent structure and a custom reward function to optimize 
trading decisions and risk transfer behaviors. Lastly, 
provide a comprehensive survey of deep learning applications in finance, cat- 
egorizing models and identifying future research opportunities. These studies 
collectively demonstrate advancements in applying deep learning to financial 
applications, highlighting the potential for innovative techniques to improve 
financial model robustness and performance. However, none of these studies 
specifically focus on e portfolio construction. 
[Moskowitz et al.| (2012 2012) introduced the concept of time-series momentum 
(TSMOM), — that the excess returns of an asset over the past 12 
months strongly predict its future performance. Since introducing this con- 
cept, it has become a conventional aS TB for practitioners to implement 
momentum-based portfolios Asness et al.} [2014} [Hurst et al.| [2017 [Baltas &] 
[Kosowskil (2021). In recent years, the use =i deep ty (ean ee in port- 
folio | UN [Koon has E BS Iri A AT a Wer am 
et al} 2020 


. ae o 7 Anes space ee = eae for at ee 
' proposed deep-learning architecture that improves upon 
traditional time-series momentum and mean-reversion strategies. With mul- 
tiple attention heads 2017), it tracks diverse market regimes 
across timescales and offers interpretability by highlighting influential factors 
and key time steps, refining trading strategies. The attention mechanism 
helps to enhance learning of long-term dependencies and adaptability to new 
market conditions like the SARS-CoV-2 crisis. first 
proposed the application of deep multi-task learning (MTL) in momentum 
portfolio construction, which incorporates auxiliary tasks related explicitly 


to volatility forecasting (Parkinson| |1980; |Garman & Klass} |1980 Garman & Klass} |1980} [Rogers &| 
[Satchell] [1991} Yang & Zhang| 2000). on findings highlighted that a com- 
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prehensive MTL framework, encompassing all proposed auxiliary tasks, not 
only enhances the risk-adjusted performance of portfolios but also notably 
reduces maximum drawdowns compared to deep learning models without 
auxiliary tasks or those with a single auxiliary task. This approach under- 
scores the critical role of effectively selecting auxiliary tasks in MTL settings 
to optimize portfolio outcomes. 

Despite extensive research in the field, a significant gap persists in the 
development of a unified momentum portfolio capable of adapting to trends 
of varying speeds. Previous studies have typically focused independently 
on fast (less than one month), medium (three to six months), or slow (six 
months to one year) momentum strategies, rather than integrating these 
approaches into a single, dynamic framework. A unified momentum approach 
aims to capture momentum across this spectrum, seamlessly adjusting to 
the changing pace of trends within assets and asset classes in the portfolio. 
Addressing this gap in the literature represents a crucial advancement that 
could significantly enhance our understanding of momentum-based trading 
and lead to more robust portfolio management strategies. By developing such 
a portfolio, we can better exploit the full range of momentum opportunities 
in financial markets, resulting in superior risk-adjusted returns compared to 
existing momentum strategies in the literature. 

This work aims to bridge this research gap by presenting a deep learn- 
ing approach to constructing a unified momentum portfolio. We begin by 
applying the principles of multi-task learning to train three task-specific net- 
works, each specializing in predicting the forward momentum signal score for 
one month, three months, and six months ahead. These networks generate 
portfolios representing fast, medium, and slow momentum strategies. The 
outputs from these task-specific networks are then fed into a final network, 
the Capital Allocation Module, which determines the weight allocation for 
each portfolio. By distributing risk according to the allocations provided 
by the Capital Allocation Module, we create a unified momentum portfolio 
that effectively exploits and leverages both short-term and long-term trends 
across assets and asset classes in the portfolio. 


3. Methodology 


3.1. Overview 


This work proposes to include Multi-Gate Mixture of Experts with a 


Multi-Task Learning Architecture (Ma et al. |2018). This novel approach 
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combines Long-Short Term Memory (LSTM) (Hochreiter & Schmidhuber} 
1997) modules with the Multi-Gate Mixture of Experts (MoMME) frame- 


work (Jacobs et al.| |1991) to construct a unified momentum portfolio. 
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Figure 1: The figure illustrates the proposed architecture, highlighting the flow from shared 
LSTM experts through task-specific gating and FNN layers and culminating in a final 
FNN that determines portfolio weights. These weights allocate risk across the portfolios 
generated by the task-specific FNN layers, which include three momentum portfolios (Fast, 
Mid, and Slow), each tailored to different trend speeds. The overarching goal of the final 
unified momentum portfolio is to capitalize on diverse market trends strategically. The 
symbol ®) followed by @ denotes the weighted sum of the outputs by the gating network 
with either the LSTM experts or the FNN task-specific network output. 


In our proposed architecture, the LSTM experts serve as shared layers 
forming the backbone of our multi-task learning framework. This setup en- 
ables effective parameter sharing across various task-specific pathways yield- 
ing two key benefits e077). 
Firstly, the shared LSTM experts facilitate a more efficient learning process 
by leveraging commonalities among tasks and consolidating learning efforts. 
This approach accelerates the training process and enhances overall model 
performance by drawing on a broader base of data insights. Secondly, the 
use of shared experts enhances the model’s generalization capabilities. By 
exposing the model to a variety of tasks within the same learning process, it 


becomes less prone to overfitting on any single task (Ghosn & Bengio} |1996} 
2000; 2017} |Liebel & Körner| |2018). Overall, the integra- 


tion of shared LSTM experts within our architecture underscores a strategic 
approach to harnessing the complexities of financial data. 
Each task-specific network has a dedicated gating network, a cornerstone 


of the Multi-Gate Mixture of Experts (MoMME) framework 
2018). In our work, these gating networks are specialized one-layer FNNs 
with a softmax activation function. They receive the same feature inputs 
as the LSTM experts and output a set of weights that sum to one. These 
weights determine the reliance on the corresponding LSTM experts. By se- 
lectively activating relevant LSTM experts, the gating networks enhance the 
performance of task-specific networks. This selective activation enables more 
effective learning, as each task-specific network can focus on constructing 
momentum portfolios tailored to specific speeds or timeframes, ultimately 
improving the performance of the constructed momentum portfolios. Here, 
three task-specific networks are trained to minimize the root mean square 
error (RMSE) between their outputs and the forward-looking TSMOM sig- 
nals with one-month, three-month, and six-month lookback timeframes. The 
outputs of these task-specific networks yield the fast, medium, and slow mo- 
mentum portfolios. Finally, we have a task-specific network called the Capital 
Allocation Network, supported by a gating network. The gating network re- 
ceives feature inputs and assigns appropriate weights to the outputs of each 
task-specific network. The weighted outputs are then fed into the Capital 
Allocation Network, which is trained to allocate weights to the fast, medium, 
and slow momentum portfolios generated by the three preceding task-specific 
networks. This results in a final unified momentum portfolio that strategi- 
cally capitalizes on opportunities across various market trends. 

The output produced by the Capital Allocation Network serves as a set 
of weights assigned to the fast, medium, and slow momentum portfolios gen- 
erated by the preceding task-specific networks. These weights determine the 
allocation of capital across the different portfolios, reflecting the model’s 
strategic decisions on how to distribute resources among various market 
trends. By optimizing these weights, the Capital Allocation Network aims to 
construct a final unified momentum portfolio that effectively captures oppor- 
tunities across diverse market conditions. Essentially, the portfolio weights 
determined by the Capital Allocation Network represent the model’s assess- 
ment of the relative importance and potential profitability of each momen- 
tum portfolio. In summary, the output of the Capital Allocation Network, 
in conjunction with the portfolio weights for the fast, medium, and slow mo- 
mentum portfolios, collectively yield the final unified momentum portfolio. 
This integrated approach enables the model to adaptively allocate resources 
and strategically capitalize on market trends, ultimately enhancing portfolio 
performance. 


To sum up, our proposed deep mixture of experts’ multi-gate and multi- 
task learning architecture generates three distinct momentum portfolios, each 
designed to capture momentum at different timeframes. The Capital Alloca- 
tion Network also allocates weights to these three momentum portfolios, con- 
structing the final unified momentum portfolio. This approach enables the 
creation of a unified momentum portfolio in a single step, with the model op- 
timized in an end-to-end fashion. A single-step, end-to-end optimized model 
is superior because it ensures that all components are trained simultaneously, 
allowing for seamless integration and interaction between different parts of 
the model. This approach enhances overall performance by effectively cap- 
turing dependencies and relationships within the data, resulting in a more 
cohesive and robust final portfolio. Our experimental results, detailed in 
Section [| substantiate this claim. 


3.2. Multi-Task Learning Network 

By categorizing momentum into fast (one-month), medium (three-month), 
and slow (six-month) categories, each task-specific network within our Multi- 
Task Learning framework is trained to construct portfolios that capture mo- 
mentum at their respective time frame. This segmentation enhances the 
model’s ability to detect and leverage the subtle variations in momentum 
within each asset class and individual asset, thereby improving the precision 
and relevance of its predictions for future momentum returns across diverse 
assets. Each task-specific network is represented by a Feedforward Neural 
Network (FNN), trained to predict the forward-looking time-series momen- 
tum (TSMOM) signal. The TSMOM signal is essentially the forward return 
of an asset, adjusted for its volatility. This adjustment takes into account 
the risk associated with the asset, providing a normalized measure of mo- 
mentum that is more comparable across different assets. During the training 
process, the objective is to minimize the difference between the predicted 
TSMOM signal and the actual forward-looking TSMOM signal (the ground 
truth, denoted as g'). Specifically, we minimize the Root Mean Squared Er- 
ror (RMSE) between the predicted output y! and the ground truth ĝt. The 
RMSE is a commonly used metric for regression tasks, providing a measure 
of the average magnitude of the errors between predicted and actual values. 


1 
LRMSE = 5 jp lRMSB, (1) 


teB 
where, 


n 


T 
RMSE, = A S Gi -a 


i=1 


where Lrmse represents the overall RMSE loss over a batch of size B. The 
term lrmsr, denotes the RMSE at specific time t in the batch. In this context, 
n is the total number of assets in the portfolio and y is the prediction output 
by the task-specific network for the forward-looking TSMOM signal for asset 
i at time t. Moreover, the forward-looking TSMOM signal for each asset i 
at time t, denoted as ĝ; represents the ground truth, or the actual observed 
value of the forward-looking TSMOM signal for each asset ¿ at time t. This 
ground truth is used during the training phase of your model to compare 
against the predicted output y}. It is defined as: 


: . ri 
ĝi = TSMOM = Pra (2) 
t+s 


In this formulation, r},,,,, represents the returns from t + 1 to t+ s, 
and o},, is the standard deviation of the returns, both calculated at a future 
window s beyond time t. Here, the window s varies according to the speed 
category of the momentum being analyzed: 20 trading days for DeepUnified- 
Mom(Fast), 60 trading days for DeepUnifiedMom(Medium), and 120 trading 
days for DeepUnifiedMom(Slow). This variation allows each task-specific net- 
work to fine-tune its learning process to the particular momentum time frame 
it addresses, thus enhancing the model’s ability to adapt to the distinct mar- 
ket dynamics associated with each speed category. Finally, the return of the 
fast, medium and slow portfolio can be calculated as follows: 


1 n 
p = pst i 
Tit = T ` Yer X Titl (3) 
i=1 


where p represents the DeepUnifiedMom-Fast, Medium and Slow portfolio, n 
is the number of assets in the portfolio, Ue is is the output of the task-specific 
network (indicating the weight allocated to asset i for the given momentum 
timeframe) at time t, rj},,, is the one day return of the asset i and rf, is 
the p portfolio’s return at time t. 


3.3. Capital Allocation Network 


The objective of the Capital Allocation Network (CAN) is to generate 
weights for allocating capital across the various momentum portfolios pro- 
duced by the task-specific networks. It is implemented as a specialized feed- 
forward neural network (FNN) with a tanh activation function at each inter- 
mediate layer and a softmax activation function at the final layer to ensure 
the output sums to 1. By applying the weights generated by the CAN to 
DeepUnifiedMom-Fast, Medium and Slow, we obtain the final unified momen- 
tum portfolio, which we term DeepUnifiedMom(CAN). The unified momentum 
portfolio’s return can be calculated as follows: 


uU p p 
Tit+1 = ` Writ X Tithi (4) 


pEP 


where P is the set of fast, medium and slow momentum portfolios, w/_,, is 
the weight predicted by the CAN for portfolio p at time t, Ts 41 is the one 
day return of the asset 7 and r/,,, is the return of the unified momentum 
portfolio. 

Since the objective of the CAN differs from those of the task-specific 
networks, we use the Sharpe Ratio as the objective func- 
tion for training. By utilizing the Sharpe Ratio, as outlined in Equation 
we direct the model to learn how to generate portfolios optimized for risk- 
adjusted returns from the input features. The Sharpe Ratio measures an 
investment’s performance relative to a risk-free asset, adjusting for risk, and 
offers a comprehensive metric for evaluating the trade-off between risk and 
return. Incorporating this ratio into the model’s learning process ensures 
that the constructed portfolios aim to maximize returns while minimizing 


risk, resulting in superior risk-adjusted performance (Lim et al.| 
Ong & Herremans) 2023) 


(5) 


where E[r’] and o,v are the mean and standard deviation of the unified mo- 
mentum portfolio’s realised returns, respectively. The high noise-to-signal 
ratio in financial data significantly increases the risk of overfitting in deep 


, POLA) Bailey et al] 


. During training, we may 


L'sharpe Ratio’ = — 
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Sharpe Ratio with Soft Capping Mechanism (Threshold = 0.01) 
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Figure 2: Relationship between the Sharpe Ratio with Soft Capping Mechanism and the 
Standard Sharpe Ratio with Threshold = 0.01 


encounter instances where certain batches have an extremely high noise-to- 
signal ratio. Fitting to this noise can result in a high Sharpe Ratio. Con- 
sequently, the model, aiming to maximize the Sharpe ratio, may overfit to 
these noisy patterns, allocating more weight to these instances and perform- 
ing well on the training data but failing to generalize to new data. We 
introduce a modified objective function called the Sharpe Ratio with a Soft 
Capping Mechanism to mitigate this risk. It first caps the Sharpe ratio at 
a specified threshold value, ensuring that any value above this threshold is 
limited. For values that exceed this threshold, the function computes the 
excess and applies a logarithmic transformation, which reduces the impact 
of these extreme values by making them grow more slowly. Similarly, it 
ensures that the Sharpe ratio does not fall below the negative of this thresh- 
old by capping the lower end and applying a logarithmic transformation to 
values below the threshold. This combination of capping and logarithmic 
adjustments smooths out the extremes, as shown in the equation below: 
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Lsrion = —(L + log(1 + Ue) — log(1 — Le)) (6) 


where, 


U = min(SR, 7) 
U: = max(SR — 7, 0) 
L = max(U, —7) 
Le = min(SR — 7,0) 


Here, SRsoft represents the modified Sharpe ratio with soft capping mech- 
anism, SR represents the original Sharpe ratio, and T is the threshold, which 
is set to 0.01 during the training process. The resulting Sharpe Ratio with 
Soft Capping mechanism can be seen in Figure The logarithmic trans- 
formation applied to values exceeding the threshold moderates their growth, 
reducing the impact of extreme values. This smoothing effect stabilizes the 
training process by preventing abrupt changes in the model’s behavior due 
to outliers. Consequently, the model is encouraged to focus on more con- 
sistent and reliable patterns in the data, leading to better generalization. 
As a result, the model is less likely to overfit to noise and more likely to 
capture true underlying signals that are relevant to the objective at hand. 
Our experimental results, detailed in Section |4| will substantiate the effec- 
tiveness of training our proposed architecture with the Sharpe Ratio with a 
Soft Capping Mechanism. This comparison with models trained without the 
mechanism will highlight the improvement of performance and generalization 
capabilities when using the Sharpe Ratio with the Soft Capping Mechanism. 


3.4. Loss function 


Lota = Lsreor, + > LRMsE (7) 
pEP 

Putting it all together, the final loss function of our model, which we 
minimize during training, can be written as Equation (7). In this equation, 
Liotat combines two components. The first component, Lgr..,,, represents the 
loss for the CAN, calculated as the negative Sharpe ratio with a soft capping 
mechanism. The second component is the sum of the RMSE losses LRyyop for 
each task-specific network. The overall loss function integrates these elements 

to guide the training process and optimize the model’s performance. 
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4. Experimental Setup 
4.1. Dataset 


Individual futures contracts are subject to expiration dates and vary- 
ing levels of liquidity, which can hinder the practical analysis of long-term 
trends. To overcome this challenge, we rely on the Pinnacle Data Corp CLC 
database as our primary data source for evaluating the proposed model. The 
data we used in our experimentation spans from January 1990 to December 
2023, providing daily data and encompassing over three decades of historical 
information. The Pinnacle Data Corp CLC database offers a comprehen- 
sive continuous price history for 49 futures contracts across diverse asset 
classes, such as commodities, currencies, fixed income, and equity index fu- 
tures. We leverage the continuous contract history of each asset, constructed 
through end-to-end concatenation and price adjustment using the backward- 
ratio method, to ensure robust analysis of long-term trends. 


4.2. Feature Set 


We derive a set of time-series momentum features from the daily settled 
price of the continuous futures by taking the log returns (r?_ at) over the past 
3 trading days, 5 trading days, 10 trading days, 21 trading days, 63 trading 
days, 126 trading days, and finally 252 trading days: 


: P? 
ri g; = In — 8 

Ha = n pe (8) 
where rj_4, is the natural logarithm of the d-day return of the asset i at day t, 
P; is the settled price of asset i at time t and P}_, is the settled price of asset 
i, d trading days ago at time t. To normalize the returns and account for the 
variability in market conditions, we scale the calculated log returns by the 
asset’s volatility. This approach ensures that the returns are standardized, 
allowing for a more equitable comparison across different assets and time 


periods. The scaled return, 7}_4,, is computed as follows: 


zi Tdi 
cae ay (9) 


where 7}_ 4, represents the volatility-normalized return over the d-day period 
for asset i at day t, rj_4, is the log return as previously defined, and o}_,4, 
denotes the volatility of the asset over the d-day period. 


13 


Following the approach of |Ong & Herremans| (2023), our methodology 


for feature creation is guided by two principal considerations. Firstly, we 
aim to preserve the essence of time-series momentum by utilizing features 
that align closely with those employed in the construction of time-series mo- 
mentum portfolios, as outlined by (2012). Secondly, and 
of greater significance, we intentionally limit the complexity of our feature 
engineering to ensure that the observed performance of the portfolios is pri- 
marily attributed to the efficacy of our architectural design, rather than to 
the ingenuity or specificity of the features used. This approach underscores 
our commitment to validating the inherent strength and adaptability of the 
architecture in capturing momentum trends, rather than leveraging elaborate 
feature engineering to enhance portfolio performance artificially. 


4.3. Benchmark Models 
The concept of Time-Series Momentum (TSMOM) portfolios, as intro- 


duced by|Moskowitz et al.| (2012), forms the cornerstone of our benchmarking 


process. These portfolios operate on the principle of buying or selling assets 
based on their performance over the past 12 months. To comprehensively 
assess our model’s efficacy, we have meticulously crafted a suite of TSMOM 
portfolios, each tailored to capture distinct momentum horizons: 


e TSMOM(1): based on the past one month’s returns. 

e TSMOM(3): based on the past three month’s returns. 

e TSMOM(6): based on the past six month’s returns. 

e TSMOM(12): based on the past twelve month’s returns. 


e TSMOM(1,4): An equal-weighted combination of the 1, 2, 3, and 4- 
month TSMOMs. 


e TSMOM(5,8): An equal-weighted combination of the 5, 6, 7, and 8- 
month TSMOMs. 


e TSMOM(9,12): An equal-weighted combination of the 9, 10, 11, and 
12-month TSMOMs. 


e TSMOM(1,12): An equal-weighted combination of the 1 to 12-month 
TSMOMs. 
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This wide array of TSMOM portfolios serves as a benchmark, enabling 
us to evaluate our proposed model against a spectrum of momentum-based 
investment strategies across different timeframes. To support our claim that 
a unified momentum portfolio, which can capitalize on momentum opportu- 
nities across various timeframes, DeepUnifiedMom(CAN) should outperform 
all TSMOM benchmark strategies. In addition to that, to rigorously evalu- 
ate the final unified momentum portfolio’s performance constructed by the 
DeepUnifiedMom(CAN), we established two benchmark portfolios: DeepUni- 
fied(Mom(EQWT) and DeepUnifiedMom(MVO). These carefully chosen bench- 
marks will help us assess the effectiveness of the unified momentum portfolio 
constructed by the CAN compared to existing standard portfolio construc- 
tion techniques. 


e DeepUnified(Mom(EQWT): This portfolio equally distributes weights from 
portfolios constructed by DeepUnifiedMom-Fast, Medium and Slow. 


e DeepUnifiedMom(MVO) utilizes Mean-Variance Optimization (MVO) 


(Markowitz| |1952) to construct a portfolio that maximizes the Sharpe 


ratio. 


Constructing a portfolio with equal weighting is a straightforward heuris- 
tic that does not involve any optimization process. In contrast, constructing 
a final portfolio using MVO involves a second optimization process by using 
the historical returns of DeepUnifiedMom-Fast, Medium and Slow portfolios 
to calculate the expected returns and covariance matrix, which are then op- 
timized by maximizing the Sharpe ratio to construct the final portfolio. The 
key drawback of such approaches is that they do not consider the interac- 
tions between the components holistically. This fragmented optimization 
can lead to suboptimal overall performance because each step is optimized in 
isolation without accounting for all components’ interdependencies and joint 
effects. Finally, to substantiate our earlier claim that our proposed model’s 
single-step, end-to-end optimization of a final unified momentum portfolio is 
more optimal, the final portfolio constructed by the DeepUnifiedMom(CAN) 
should outperform the portfolios constructed using both the DeepUnified- 
Mom(EQWT) and DeepUnifiedMom(MVO) methods. 


4.4. Backtest Specifications 
In the following section, we present the backtest results obtained from 
our proposed model, trained using an expanding window cross-validation 
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Figure 3: The diagram illustrates the expanding window cross-validation approach used 
in our model’s training. 


approach with the first batch of training data spanning a period of 10 years, 
as illustrated in Figure |3| Throughout the training phase, 20% of the data 
was allocated for validation purposes. The trained models were then utilized 
to construct portfolios for the test set, with each portfolio corresponding to 
a year’s worth of out-of-sample data. This process was repeated 24 times, 
resulting in an out-of-sample backtest period from January 2000 to December 
2023. 


Parameters Values 
Number of LSTM Layers 1, 2,3 
LSTM Hidden Units 64, 126, 252, 512 
Number of LSTM Experts 3, 6, 9, 12 
Task-Specific Netowrk Layers 2,3,4 


Task-Specific Network Hidden Units 64, 126, 252, 512 


Table 1: Hyperparameter Search Space. 


During the backtesting phase, our model underwent training on the des- 
ignated training dataset. We employed Stochastic Gradient Descent (SGD) 
with the Adam optimizer to minimize the loss functions. We conducted a grid 
search on the validation set, guided by the parameter search space outlined 
in Table |1| The training process was designed to conclude after 20 epochs; 
however, we incorporated an early stopping mechanism that halts training if 
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there’s no improvement in validation loss over 5 consecutive epochs. Utiliz- 
ing an expanding window approach for out-of-sample training and validation, 
the final training iteration—covering data from January 1990 to December 
2023—required approximately 1 hour on a system equipped with an NVIDIA 
GeForce RTX 2090. 


5. Performance Evaluation 


As shown in Table|2| our final unified momentum portfolio generated by 
DeepUnifiedMom(CAN), outperforms all TSMOM benchmark models, includ- 
ing DeepUnifiedMom-Fast, Medium, and Slow, as well as the final portfolios 
generated by DeepUnifiedMom - EQWT and MVO. DeepUnified(Mom(CAN) 
achieves a Sharpe ratio of 2.33 and a Sortino ratio of 3.88 while incurring 
a maximum drawdown of -1.02%. In comparison, the best TSMOM bench- 
mark strategy, TSMOM(1,12), achieves a Sharpe ratio of 1.07 and a Sortino 
ratio of 1.58, with a drawdown of -2.01%. This demonstrates the superior 
performance of DeepUnified(Mom(CAN) in terms of both risk-adjusted returns 
and drawdown management. 

Additionally, when comparing the portfolios generated by the task-specific 
networks DeepUnifiedMom-Fast, Medium, and Slow, to the final unified mo- 
mentum portfolio, the latter consistently outperforms the former on a risk- 
adjusted basis, notably achieving a much lower maximum drawdown. The 
best-performing task-specific network, DeepUnifiedMom(Slow), achieves a Sharpe 
ratio of 1.54 and a Sortino ratio of 2.41 while incurring a maximum drawdown 
of -3.62%, which is almost 3.5 times larger than the maximum drawdown 
incurred by the final unified momentum portfolio. When we compare the 
performance of DeepUnifiedMom-Fast, DeepUnifiedMom-Medium, and Deep- 
UnifiedMom-Slow against TSMOM benchmark strategies, the results are less 
appealing. Across the board, DeepUnifiedMom-Fast, Medium, and Slow incur 
much higher maximum drawdowns compared to TSMOM strategies. Run- 
ning a portfolio that incurs significantly larger drawdown risk without cor- 
responding performance compensation may not be an appealing strategy for 
many portfolio managers. 

Overall, the performance of the final unified momentum portfolio gener- 
ated by DeepUnifiedMom(CAN) compared to the benchmark TSMOM strate- 
gies and DeepUnifiedMom-Fast, Medium, and Slow supports the claim that 
a portfolio capable of capitalizing on a spectrum of momentum opportuni- 
ties results in a more robust portfolio, contributing to better risk-adjusted 
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performance and minimizing maximum drawdown. 


Portfolios Ann. Return (%) Ann. vol (%) Sharpe Sortino Max DD (%) 
TSMOM(1) 1.06 1.44 0.73 1.09 -4.60 
TSMOM(3) 1.16 1.46 0.80 1.16 -3.37 
TSMOM(6) 1.18 1.45 0.82 1.20 -2.91 
TSMOM(12) 1.48 1.47 1.01 1.46 -3.28 
TSMOM (1,4) 1.27 1.27 1.00 1.49 2.05 
TSMOM (5,8) 1.19 1.37 0.87 1.27 2.80 
TSMOM (9,12) 1.40 1.40 1.00 1.45 2.60 
TSMOM (1,12) 1.30 1.22 1.07 1.58 2.01 
DeepUnifiedMom(Fast) 1.85 1.39 1.34 2.06 6.32 
DeepUnifiedMom(Medium) 1.32 1.34 0.99 1.47 3.52 
DeepUnifiedMom(Slow) 2.14 1.40 1.54 2.41 -3.62 
DeepUnifiedMom(CAN) 1.92 0.82 2.33 3.81 -1.02 
DeepUnifiedMom(EQWT) 1.79 0.77 2.31 3.71 -0.99 
DeepUnifiedMom(MVO) 1.91 1.11 1.72 2.69 -3.13 


Table 2: Backtest results (net) for the period from January 2000 to December 2023, with 
transaction costs set at 3 basis points. The DeepUnifiedMom model results presented here 
were obtained using the Sharpe Ratio with a Soft Capping mechanism as the loss function 
during training. Max DD stands for maximum drawdown. 


Figure |5| shows that the weights assigned by the DeepUnifiedMom(CAN) 
to the DeepUnifiedMom portfolios —Fast, Medium, and Slow— maintain 
remarkable consistency throughout the year. Moreover, the weight alloca- 
tion strategy implemented by the DeepUnified(Mom(CAN) markedly diverges 
from traditional equal weighting approaches, demonstrating that DeepUni- 
fiedMom(CAN) does not rely on a basic equal weighting scheme. However, 
DeepUnified(Mom(CAN) outperforms DeepUnified(Mom(EQWT) by a narrow 
margin, achieving a Sharpe ratio of 2.33 compared to 2.31, and a Sortino 
ratio of 3.81 compared to 3.71. While DeepUnifiedMom(EQWT) incurred a 
slightly smaller maximum drawdown of -0.99%. Comparatively, both the 
DeepUnifiedMom(CAN) and DeepUnifiedMom(EQWT) portfolios considerably 
outperform the DeepUnifiedMom(MVO) approach in terms of risk-adjusted re- 
turns and maximum drawdown. The Sharpe ratio of DeepUnifiedMom(MVO) 
portfolio is only 1.72, significantly lower than the 2.33 of the DeepUnified- 
Mom(CAN) portfolio. The maximum drawdown for the DeepUnifiedMom(MVO) 
portfolio stands at -3.13%, which is less favorable than the drawdowns ex- 
perienced by the equal weighted TSMOM portfolios such as TSMOM(1,4), 
TSMOM(5,8), TSMOM(9,12), and TSMOM(1,12). These findings suggest 
that the DeepUnifie(Mom(MVO) approach may be suboptimal given that 
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(d) DeepUnifiedMom(CAN) versus Baseline Portfolio Allocations: Equal Weight and MVO. 


Figure 4: Portfolios’ Cumulative Returns (%) from January 2000 to December 2023. 
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Figure 5: The figure illustrates the portfolio weight allocations determined by the Capital 
Allocation Task Specific Network within the DeepUnifiedMom framework across the Fast, 
Medium, and Slow portfolios from January 2000 to December 2023. 


both DeepUnified(Mom(CAN) and DeepUnifiedMom(EQWT) perform much bet- 
ter. While criticism of MVO is well-established, our results lend further 
support to these critiques, emphasizing the need for ongoing refinement 


and exploration of alternative strategies (Michaud & Michaud |2007} 
ley & Lopez de Prado} |2012; |Jurczenko & ‘Teiletche} |2015; [Lopez de Prado} 


2016). In conclusion, the performance analysis of DeepUnifiedMom(CAN) in 
comparison to benchmark strategies and DeepUnifiedMom portfolios —Fast, 
Medium, and Slow, underscores its effectiveness in portfolio management. 
By outperforming TSMOM benchmarks and task-specific networks in terms 
of risk-adjusted returns and drawdown management, DeepUnifiedMom(CAN) 
demonstrates its ability to capitalize on momentum opportunities across a 
spectrum of market conditions. This suggests that a unified approach to 
portfolio construction, such as DeepUnifiedMom(CAN), leads to a more ro- 
bust portfolio, contributing to improved risk-adjusted performance and min- 
imized maximum drawdown. These findings highlight the potential of Deep- 
UnifiedMom(CAN) as a practical and promising solution for investors seeking 
enhanced portfolio performance in dynamic market environments. 

Table |3| reveals that when DeepUnifiedMom(CAN) is trained using the 
Sharpe Ratio with a Soft Capping Mechanism, the resulting portfolios con- 
sistently perform better than those trained using only the Sharpe Ratio. 
Specifically, DeepUnifiedMom(CAN) trained with the modified Sharpe Ratio 
achieved a Sharpe Ratio of 2.33 and a Sortino Ratio of 3.81, compared to 
2.14 and 3.43, respectively, for those trained with the standard Sharpe Ra- 
tio. Additionally, the task-specific networks constructing the Fast, Medium, 
and Slow portfolios also outperform their counterparts. Overall, the results 
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Portfolios Ann. Return (%) Ann. vol (%) Sharpe Sortino Max DD (%) 


Sharpe Ratio with Soft Capping Mechanism (Threshold = 0.01) 


DeepUnifiedMom(Fast) 85 39 34 2.06 -6.32 
DeepUnifiedMom(Medium) .32 34 0.99 1.47 -3.52 
DeepUnifiedMom(Slow) 2.14 40 54 2.41 -3.62 
DeepUnifiedMom(CAN) 92 0.82 2.33 3.81 -1.02 
DeepUnifiedMom EQWT) 79 0.77 2.31 3.71 -0.99 
DeepUnifiedMom(MVO) 91 Al 72 2.69 -3.13 
Sharpe Ratio 

DeepUnifiedMom (Fast) 73 42 22 1.87 -6.33 
DeepUnifiedMom(Medium) .20 39 0.87 1.28 -3.61 
DeepUnifiedMom(Slow) 2.03 43 A2 2.20 -4.00 
DeepUnifiedMom(CAN) .80 0.84 2.14 3.43 -1.10 
DeepUnifiedMom EQWT) .67 0.79 2.11 3.33 -1.21 
DeepUnifiedMom(MVO) 72 AT AT 2.26 -3.64 


Table 3: Here are the backtest metrics (net) for the period from January 2000 to December 
2023, with transaction costs set at 3 basis points. The DeepUnifiedMom model results 
presented here were obtained using the Sharpe Ratio with a Soft Capping mechanism as 
the loss function during training. 


are promising, indicating that further research into improved Sharpe Ratio 
objective functions for training deep learning model is worthwhile. 


6. Conclusion 


The proposed DeepUnifiedMom framework represents a significant ad- 
vancement in applying deep learning in portfolio management, adeptly ad- 
dressing the limitations of traditional momentum strategies. At its core, 
DeepUnifiedMom leverages advanced deep learning, employing a multi-task 
learning approach and a multi-gate mixture of experts to construct unified 
momentum portfolios. Our extensive backtesting, spanning various asset 
classes such as equity indexes, bonds, currencies, and commodities, has con- 
sistently shown that DeepUnifiedMom surpasses benchmarks, maintaining its 
superior performance even after accounting for transaction costs. This high- 
lights its ability to construct a final portfolio that accounts for a wide spec- 
trum of momentum opportunities in the financial market in an end-to-end 
fashion. This new approach to using deep learning in investment strate- 
gies showcases the benefits of advanced computational techniques in finan- 
cial decision-making and outcomes. The model was developed using Python 
and Pytorch; the framework is accessible for review and utilization, with its 
source code publicly available onlind!] Future research will focus on enhanc- 


https://github.com/joelowj/unified_mom_mmoe 
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ing the DeepUnifiedMom framework by incorporating sparsity into the gating 
mechanism and utilizing more sophisticated deep learning architectures, such 
as the Transformer model, for time-series analysis. Additionally, efforts will 
be made to integrate explainable AI techniques into the portfolio construc- 
tion process, aiming to increase the transparency and interpretability of the 
DeepUnifiedMom framework. This progression will not only refine the model’s 
performance but also bolster user trust and understanding of how Al-driven 
decisions are made within the portfolio management context. 
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